Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
BMC Genomics ; 25(1): 405, 2024 Apr 24.
Artigo em Inglês | MEDLINE | ID: mdl-38658835

RESUMO

Graph-based pangenome is gaining more popularity than linear pangenome because it stores more comprehensive information of variations. However, traditional linear genome browser has its own advantages, especially the tremendous resources accumulated historically. With the fast-growing number of individual genomes and their annotations available, the demand for a genome browser to visualize genome annotation for many individuals together with a graph-based pangenome is getting higher and higher. Here we report a new pangenome browser PPanG, a precise pangenome browser enabling nucleotide-level comparison of individual genome annotations together with a graph-based pangenome. Nine rice genomes with annotations were provided by default as potential references, and any individual genome can be selected as the reference. Our pangenome browser provides unprecedented insights on genome variations at different levels from base to gene, and reveals how the structures of a gene could differ for individuals. PPanG can be applied to any species with multiple individual genomes available and it is available at https://cgm.sjtu.edu.cn/PPanG .


Assuntos
Genômica , Genômica/métodos , Oryza/genética , Anotação de Sequência Molecular , Genoma de Planta , Variação Genética , Software , Navegador , Bases de Dados Genéticas , Nucleotídeos/genética , Genoma
3.
Med Chem ; 20(2): 140-152, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-37957859

RESUMO

BACKGROUND: The epidermal growth factor receptor (EGFR) protein has been intensively studied as a therapeutic target for non-small cell lung cancer (NSCLC). The aminobenzimidazole derivatives as the fourth-generation EGFR inhibitors have achieved promising results and overcame EGFR mutations at C797S, del19 and T790M in NSCLC. OBJECTIVE: In order to understand the quantitative structure-activity relationship (QSAR) of aminobenzimidazole derivatives as EGFRdel19 T790M C797S inhibitors, the four-dimensional QSAR (4D-QSAR) and multivariate image analysis (MIA-QSAR) have been performed on the data of 45 known aminobenzimidazole derivatives. METHODS: The 4D-QSAR descriptors were acquired by calculating the association energies between probes and aligned conformational ensemble profiles (CEP), and the regression models were established by partial least squares (PLS). In order to further understand and verify the 4D-QSAR model, MIA-QSAR was constructed by using chemical structure pictures to generate descriptors and PLS regression. Furthermore, the molecular docking and averaged noncovalent interactions (aNCI) analysis were also performed to further understand the interactions between ligands and the EGFR targets, which was in good agreement with the 4D-QSAR model. RESULTS: The established 4D-QSAR and MIA-QSAR models have strong stability and good external prediction ability. CONCLUSION: These results will provide theoretical guidance for the research and development of aminobenzimidazole derivatives as new EGFRdel19 T790M C797S inhibitors.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Relação Quantitativa Estrutura-Atividade , Simulação de Acoplamento Molecular , Receptores ErbB/genética , Inibidores de Proteínas Quinases/farmacologia , Inibidores de Proteínas Quinases/química , Mutação , Resistencia a Medicamentos Antineoplásicos
4.
J Org Chem ; 88(19): 13946-13955, 2023 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-37676850

RESUMO

In this study, the visible-light-driven [2 + 2] photocycloaddition of 1,4-dihydropyrazines in solution was reported. The N,N'-diacyl-1,4-dihydropyrazines with different substituents showed completely different reactivity under the irradiation of a 430 nm blue light-emitting diode (LED) lamp. N,N'-Diacetyl-1,4-dihydropyrazine and N,N'-dipropionyl-1,4-dihydropyrazine were the only compounds capable of undergoing a [2 + 2] photocycloaddition reaction, yielding syn-dimers and cage-dimers (known as 3,6,9,12-tetraazatetraasteranes) with overall yields of 76 and 83%, correspondingly. The substituent-reactivity effect on [2 + 2] photocycloaddition of N,N'-diacyl-1,4-dihydropyrazines was investigated by density functional theory calculations. The results show that the substituents have little influence on Gibbs free energy for the [2 + 2] photocycloaddition and mainly affect the excited energy, reaction sites, and the triplet excited-state structures of 1,4-dihydropyrazines, which are closely related to whether the reaction occurs. The results offer insights into the photochemical reactivity of 1,4-dihydropyrazines and an approach for constructing dimers of N,N'-diacyl-1,4-dihydropyrazines through a solution-based visible-light-driven [2 + 2] photocycloaddition, especially for the construction of 3,6,9,12-tetraazatetraasteranes. Compared with the solid-state [2 + 2] photocycloaddition of 1,4-dihydropyrazine, this photocycloaddition will be an efficient and environmentally friendly method for synthesizing tetraazatetraasteranes with the advantages of milder reaction conditions, simple operation, adjustable reaction amounts by omitting the cocrystal growth step, etc.

6.
Chem Biol Drug Des ; 101(3): 568-580, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36112079

RESUMO

In our research on novel anticancer agents, a series of N6 -hydrazone purine derivatives were designed and synthesized by analysis of a pharmacophore model for ATP-competitive inhibitors. The activities screening results showed that N6 -hydrazone purine derivatives 21 and 26 not only showed potential antiproliferative activity against the A549 and MCF-7 cell lines comparable to Vandetanib as a positive control but also had moderate antiplatelet aggregation activity. In order to investigate the possible targets, a molecular docking study was carried out on the fourteen kinases associated with anticancer and antiplatelet aggregation activities. The results indicated that compounds 21 and 26 had the potential activity to target VEGFR-2, PI3Kα, EGFR, and HER2 kinases. The inhibition of the kinases assay showed that compound 26 could target VEGFR-2, PI3Kα, and EGFR (IC50  = 0.822, 3.040 and 6.625 µM). All results indicated that compound 26 will be an encouraging framework as potential new multi-target anticancer agent with potential antiplatelet aggregation activity.


Assuntos
Antineoplásicos , Receptor 2 de Fatores de Crescimento do Endotélio Vascular , Humanos , Relação Estrutura-Atividade , Receptor 2 de Fatores de Crescimento do Endotélio Vascular/metabolismo , Simulação de Acoplamento Molecular , Proliferação de Células , Hidrazonas/farmacologia , Ensaios de Seleção de Medicamentos Antitumorais , Antineoplásicos/farmacologia , Receptores ErbB/metabolismo , Purinas/farmacologia , Desenho de Fármacos , Inibidores de Proteínas Quinases/farmacologia , Estrutura Molecular
7.
Front Microbiol ; 13: 948138, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36081802

RESUMO

Lactococcus lactis (L. lactis) is a well isolated and cultured lactic acid bacterium, but if utilizing the isolate genomes alone, the genome-based analysis of this taxon would be incomplete, because there are still uncultured strains in some ecological niches. In this study, we recovered 93 high-quality metagenome-assembled genomes (MAGs) of L. lactis from food and human gut metagenomes with a culture-independent method. We then constructed a unified genome catalog of L. lactis by integrating these MAGs with 70 publicly available isolated genomes. Having this comprehensive resource, we assessed the genomic diversity and phylogenetic relationships to further explore the genetic and functional properties of L. lactis. An open pangenome of L. lactis was generated using our genome catalog, consisting of 13,066 genes in total, from which 5,448 genes were not identified in the isolate genomes. The core genome-based phylogenetic analysis showed that L. lactis strains we collected were separated into two main subclades corresponding to two subspecies, with some uncultured phylogenetic lineages discovered. The species disparity was also indicated in PCA analysis based on accessory genes of our pangenome. These various analyzes shed further light on unexpectedly high diversity within the taxon at both genome and gene levels and gave clues about its population structure and evolution. Lactococcus lactis has a long history of safe use in food fermentations and is considered as one of the important probiotic microorganisms. Obtaining the complete genetic information of L. lactis is important to the food and health industry. However, it can naturally inhabit many environments other than dairy products, including drain water and human gut samples. Here we presented an open pan-genome of L. lactis constructed from 163 high-quality genomes obtained from various environments, including MAGs recovered from environmental metagenomes and isolate genomes. This study expanded the genetic information of L. lactis about one third, including more than 5,000 novel genes found in uncultured strains. This more complete gene repertoire of L. lactis is crucial to further understanding the genetic and functional properties. These properties may be harnessed to impart additional value to dairy fermentation or other industries.

8.
Nat Commun ; 13(1): 5412, 2022 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-36109518

RESUMO

Pangenomic study might improve the completeness of human reference genome (GRCh38) and promote precision medicine. Here, we use an automated pipeline of human pangenomic analysis to build gastric cancer pan-genome for 185 paired deep sequencing data (370 samples), and characterize the gene presence-absence variations (PAVs) at whole genome level. Genes ACOT1, GSTM1, SIGLEC14 and UGT2B17 are identified as highly absent genes in gastric cancer population. A set of genes from unaligned sequences with GRCh38 are predicted. We successfully locate one of predicted genes GC0643 on chromosome 9q34.2. Overexpression of GC0643 significantly inhibits cell growth, cell migration and invasion, cell cycle progression, and induces cell apoptosis in cancer cells. The tumor suppressor functions can be reversed by shGC0643 knockdown. The GC0643 is approved by NCBI database (GenBank: MW194843.1). Collectively, the robust pan-genome strategy provides a deeper understanding of the gene PAVs in the human cancer genome.


Assuntos
Neoplasias Gástricas , Povo Asiático/genética , China , Genoma Humano , Humanos , Lectinas/genética , Receptores de Superfície Celular/genética , Neoplasias Gástricas/genética
9.
Genome Res ; 32(5): 853-863, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35396275

RESUMO

The concept of pan-genome, which is the collection of all genomes from a population, has shown a great potential in genomics study, especially for crop sciences. The rice pan-genome constructed from the second-generation sequencing (SGS) data is about 270 Mb larger than Nipponbare, the rice reference genome (NipRG), but it is still disadvantaged by incompleteness and loss of genomic contexts. The third-generation sequencing (TGS) with long reads can help to construct better pan-genomes. In this paper, we report a high-quality rice pan-genome construction method by introducing a series of new steps to deal with the long-read data, including unmapped sequence block filtering, redundancy removing, and sequence block elongating. Compared to NipRG, the long-read sequencing-based pan-genome constructed from 105 rice accessions, which contains 604 Mb novel sequences, is much more comprehensive than the one constructed from ∼3000 rice genomes sequenced with short reads. The repetitive sequences are the main components of novel sequences, which partially explain the differences between the pan-genomes based on TGS and SGS. Adding six wild rice accessions, there are about 879 Mb novel sequences and 19,000 novel genes in the rice pan-genome in total. In addition, we have created high-quality reference genomes for all representative rice populations, including five gapless reference genomes. This study has made significant progress in our understanding of the rice pan-genome, and this pan-genome construction method for long-read data can be applied to accelerate a broad range of genomics studies.


Assuntos
Oryza , Genoma , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Oryza/genética , Análise de Sequência de DNA
10.
Genome Biol ; 23(1): 94, 2022 04 14.
Artigo em Inglês | MEDLINE | ID: mdl-35422001

RESUMO

The analysis of microbiome data has several technical challenges. In particular, count matrices contain a large proportion of zeros, some of which are biological, whereas others are technical. Furthermore, the measurements suffer from unequal sequencing depth, overdispersion, and data redundancy. These nuisance factors introduce substantial noise. We propose an accurate and robust method, mbDenoise, for denoising microbiome data. Assuming a zero-inflated probabilistic PCA (ZIPPCA) model, mbDenoise uses variational approximation to learn the latent structure and recovers the true abundance levels using the posterior, borrowing information across samples and taxa. mbDenoise outperforms state-of-the-art methods to extract the signal for downstream analyses.


Assuntos
Microbiota , Modelos Estatísticos , Análise de Componente Principal , Projetos de Pesquisa
11.
Int J Biol Sci ; 17(14): 3717-3727, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34671195

RESUMO

SARS-CoV-2 belongs to the coronavirus family. Comparing genomic features of viral genomes of coronavirus family can improve our understanding about SARS-CoV-2. Here we present the first pan-genome analysis of 3,932 whole genomes of 101 species out of 4 genera from the coronavirus family. We found that a total of 181 genes in the pan-genome of coronavirus family, among which only 3 genes, the S gene, M gene and N gene, are highly conserved. We also constructed a pan-genome from 23,539 whole genomes of SARS-CoV-2. There are 13 genes in total in the SARS-CoV-2 pan-genome. All of the 13 genes are core genes for SARS-CoV-2. The pan-genome of coronaviruses shows a lower level of diversity than the pan-genomes of other RNA viruses, which contain no core gene. The three highly conserved genes in coronavirus family, which are also core genes in SARS-CoV-2 pan-genome, could be potential targets in developing nucleic acid diagnostic reagents with a decreased possibility of cross-reaction with other coronavirus species.


Assuntos
Coronaviridae/genética , Genoma Viral , Filogenia
12.
Biol Pharm Bull ; 44(7): 999-1006, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34193695

RESUMO

Flavonoids are potential strikingly natural compounds with antioxidant activity and acetylcholinesterase (AChE) inhibitory activity for treating Alzheimer's disease (AD). In present study, in line with our interests in flavonoid derivatives as AChE inhibitors, a four-dimensional quantitative structure-activity relationship (4D-QSAR) molecular model was proposed. The data required to perform 4D-QSAR analysis includes 52 compounds reported in the literature, usually analogs, and their measured biological activities in a common assay. The model was generated by a complete set of 4D-QSAR program which was written by our group. The best model was found after trying multiple experiments. It had a good predictive ability with the cross-validation correlation coefficient Q2 = 0.77, the internal validation correlation coefficient R2 = 0.954, and the external validation correlation coefficient R2pred = 0.715. The molecular docking analysis was also carried out to understand exceedingly the interactions between flavonoids and the AChE targets, which was in good agreement with the 4D-QSAR model. Based on the information provided by the 4D-QSAR model and molecular docking analysis, the idea for optimizing the structures of flavonoids as AChE inhibitors was put forward which maybe provide theoretical guidance for the research and development of new AChE inhibitors.


Assuntos
Inibidores da Colinesterase/química , Flavonoides/química , Modelos Moleculares , Relação Quantitativa Estrutura-Atividade
13.
Hum Mol Genet ; 30(22): 2110-2122, 2021 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-34196368

RESUMO

The well-established functions of UHRF1 converge to DNA biological processes, as exemplified by DNA methylation maintenance and DNA damage repair during cell cycles. However, the potential effect of UHRF1 on RNA metabolism is largely unexplored. Here, we revealed that UHRF1 serves as a novel alternative RNA splicing regulator. The protein interactome of UHRF1 identified various splicing factors. Among them, SF3B3 could interact with UHRF1 directly and participate in UHRF1-regulated alternative splicing events. Furthermore, we interrogated the RNA interactome of UHRF1, and surprisingly, we identified U snRNAs, the canonical spliceosome components, in the purified UHRF1 complex. Unexpectedly, we found H3R2 methylation status determines the binding preference of U snRNAs, especially U2 snRNAs. The involvement of U snRNAs in UHRF1-containing complex and their binding preference to specific chromatin configuration imply a finely orchestrated mechanism at play. Our results provided the resources and pinpointed the molecular basis of UHRF1-mediated alternative RNA splicing, which will help us better our understanding of the physiological and pathological roles of UHRF1 in disease development.


Assuntos
Processamento Alternativo , Proteínas Estimuladoras de Ligação a CCAAT/metabolismo , Histonas/metabolismo , Fatores de Processamento de RNA/metabolismo , RNA Nuclear Pequeno/genética , Ubiquitina-Proteína Ligases/metabolismo , Proteínas Estimuladoras de Ligação a CCAAT/genética , Humanos , Metilação , Complexos Multiproteicos , Conformação de Ácido Nucleico , Ligação Proteica , RNA Nuclear Pequeno/metabolismo , Ubiquitina-Proteína Ligases/genética
14.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34323927

RESUMO

With the development of genome-wide association studies, how to gain information from a large scale of data has become an issue of common concern, since traditional methods are not fully developed to solve problems such as identifying loci-to-loci interactions (also known as epistasis). Previous epistatic studies mainly focused on local information with a single outcome (phenotype), while in this paper, we developed a two-stage global search algorithm, Greedy Equivalence Search with Local Modification (GESLM), to implement a global search of directed acyclic graph in order to identify genome-wide epistatic interactions with multiple outcome variables (phenotypes) in a case-control design. GESLM integrates the advantages of score-based methods and constraint-based methods to learn the phenotype-related Bayesian network and is powerful and robust to find the interaction structures that display both genetic associations with phenotypes and gene interactions. We compared GESLM with some common phenotype-related loci detecting methods in simulation studies. The results showed that our method improved the accuracy and efficiency compared with others, especially in an unbalanced case-control study. Besides, its application on the UK Biobank dataset suggested that our algorithm has great performance when handling genome-wide association data with more than one phenotype.


Assuntos
Algoritmos , Estudo de Associação Genômica Ampla , Fenótipo , Polimorfismo de Nucleotídeo Único , Teorema de Bayes , Conjuntos de Dados como Assunto , Humanos
15.
J Cell Biochem ; 122(10): 1428-1434, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34132422

RESUMO

Interpreting functional analysis results derived from environmental samples using direct sequencing meta-omics data, including metagenomics and meta-transcriptomics data, is challenging due to their complexity. Visualization of functional analysis results can help researchers discover relevant biological insights. Despite the availability of many R packages, there lacks interactive and comprehensive graphic systems for displaying functional terms and corresponding genes in meta-omics analysis results. Here, we present ivTerm, an R-shiny package with a user-friendly graphical interface that enables users to inspect functional annotations, compare results across multiple experiments, create customized charts, and download these charts. It provides various basic and innovative chart types to visualize functional terms and involved genes. Users can also browse the description of terms obtained from the database web servers automatically. Two examples, including a metagenome analysis data for human gut and a meta-transcriptome data for coral symbiomes, are given to show the usage of ivTerm. In the end, we compared ivTerm with existing tools with similar functions, such as GOplot, ViSEAGO, and Chordomics. The tool ivTerm is convenient and efficient for biologists to gain an integrated view and develop deep insights by interactive analysis of meta-omics data. It can accelerate the procedure to develop insights from complex meta-omics data. The code for ivTerm is freely available at https://github.com/SJTU-CGM/ivTerm.


Assuntos
Biologia Computacional/métodos , Gráficos por Computador , Visualização de Dados , Software , Interpretação Estatística de Dados , Bases de Dados Factuais , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Genômica/métodos , Humanos , Metabolômica/métodos , Metagenoma , Transcriptoma
16.
BMC Bioinformatics ; 21(1): 468, 2020 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-33081690

RESUMO

BACKGROUND: Current taxonomic classification tools use exact string matching algorithms that are effective to tackle the data from the next generation sequencing technology. However, the unique error patterns in the third generation sequencing (TGS) technologies could reduce the accuracy of these programs. RESULTS: We developed a Classification tool using Discriminative K-mers and Approximate Matching algorithm (CDKAM). This approximate matching method was used for searching k-mers, which included two phases, a quick mapping phase and a dynamic programming phase. Simulated datasets as well as real TGS datasets have been tested to compare the performance of CDKAM with existing methods. We showed that CDKAM performed better in many aspects, especially when classifying TGS data with average length 1000-1500 bases. CONCLUSIONS: CDKAM is an effective program with higher accuracy and lower memory requirement for TGS metagenome sequence classification. It produces a high species-level accuracy.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos
17.
Front Pharmacol ; 11: 1183, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32848786

RESUMO

A high serine content in body fluid was identified in a portion of patients with gastric cancer, but its biological significance was not clear. Here, we investigated the biological effect of serine on gastric cancer cells. Serine was added into the culture medium of MGC803 and HGC27 cancer cells, and its influence on multiple biological functions, such as cell growth, migration and invasion, and drug resistance was analyzed. We examined the global transcriptomic profiles in these cultured cells with high serine content. Both MGC803 and HGC27 cell lines were originated from male patients, however, their basal gene expression patterns were very different. The finding of cell differentiation-associated genes, ALPI, KRT18, TM4SF1, KRT81, A2M, MT1E, MUC16, BASP1, TUSC3, and PRSS21 in MGC803 cells suggested that this cell line was more poorly differentiated, compared to HGC27 cell line. When the serine concentration was increased to 150mg/ml in medium, the response of these two gastric cancer cell lines was different, particularly on cell growth, cell migration, and invasion and 5-FU resistance. In animal experiment, administration of high concentration of serine promoted cancer cell metastasis to local lymph node. Taken together, we characterized the basal gene expressing profiles of MGC803 and HGC27. The HGC27 cells were more differentiated than MGC803 cells. MGC803 cells were more sensitive to the change of serine content. Our results suggested that the responsiveness of cancer cells to microenvironmental change is associated with their genetic background.

20.
Genome Biol ; 20(1): 149, 2019 07 31.
Artigo em Inglês | MEDLINE | ID: mdl-31366358

RESUMO

The human reference genome is still incomplete, especially for those population-specific or individual-specific regions, which may have important functions. Here, we developed a HUman Pan-genome ANalysis (HUPAN) system to build the human pan-genome. We applied it to 185 deep sequencing and 90 assembled Han Chinese genomes and detected 29.5 Mb novel genomic sequences and at least 188 novel protein-coding genes missing in the human reference genome (GRCh38). It can be an important resource for the human genome-related biomedical studies, such as cancer genome analysis. HUPAN is freely available at http://cgm.sjtu.edu.cn/hupan/ and https://github.com/SJTU-CGM/HUPAN .


Assuntos
Genoma Humano , Software , Povo Asiático/genética , População Negra/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Proteínas/genética , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA